K-Nearest Neighbor Classification on Spatial Data Streams
نویسندگان
چکیده
Classification of spatial data has become important due to the fact that there are huge volumes of spatial data now available holding a wealth of valuable information. In this paper we consider the classification of spatial data streams, where the training dataset changes often. New training data arrive continuously and are added to the training set. For these types of data streams, building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. KNN is extremely simple to implement and leaves itself to a wide variety of variations. KNN is a lazy classifier, in the sense that there is no training phase in the classification process. It does not build a classification model in advance. The traditional k-nearest neighbor classifier finds the k nearest neighbors based on some distance metric by finding the distance of the target data point from the training dataset and finding the class from those 1 Patents are pending on the bSQ and Ptree technology. 2 This work is partially supported by NSF Grant OSR-9553368, DARPA Grant DAAH04-96-1-0329 and GSA Grant ACT#: K96130308. nearest neighbors by some voting mechanism. There is a problem associated with KNN classifiers. They increase the classification time significantly relative to other non-lazy methods. To overcome this problem, in this paper we propose a new method of KNN classification for spatial data streams using a new, rich, data-mining-ready structure, the Peano-count-tree or P-tree. In our method, we merely perform some logical AND/OR operations on P-trees to find the nearest neighbor set of a new sample and to assign the class label. We have fast and efficient algorithms for AND/OR operations on P-trees, which reduce the classification time significantly, compared with traditional KNN
منابع مشابه
An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملk-nearest Neighbor Classification on Spatial Data Streams Using P-trees
Classification of spatial data has become important due to the fact that there are huge volumes of spatial data now available holding a wealth of valuable information. In this paper we consider the classification of spatial data streams, where the training dataset changes often. New training data arrive continuously and are added to the training set. For these types of data streams, building a ...
متن کاملk-Nearest Neighbor Classification on Spatial Data
Classification of spatial data streams is crucial, since the training dataset changes often. Building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. KNN is extremely simple to implement and lends itself to a wide variety of variatio...
متن کاملDetection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms
acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...
متن کامل